Lazy ETL in Action: ETL Technology Dates Scientific Data

نویسندگان

  • Yagiz Karæz
  • Milena Ivanova
  • Ying Zhang
  • Stefan Manegold
  • Martin L. Kersten
چکیده

Both scientific data and business data have analytical needs. Analysis takes place after a scientific data warehouse is eagerly filled with all data from external data sources (repositories). This is similar to the initial loading stage of Extract, Transform, and Load (ETL) processes that drive business intelligence. ETL can also help scientific data analysis. However, the initial loading is a time and resource consuming operation. It might not be entirely necessary, e.g. if the user is interested in only a subset of the data. We propose to demonstrate Lazy ETL, a technique to lower costs for initial loading. With it, ETL is integrated into the query processing of the scientific data warehouse. For a query, only the required data items are extracted, transformed, and loaded transparently on-the-fly. The demo is built around concrete implementations of Lazy ETL for seismic data analysis. The seismic data warehouse is ready for query processing, without waiting for long initial loading. The audience fires analytical queries to observe the internal mechanisms and modifications that realize each of the steps; lazy extraction, transformation, and loading.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Instant-On Scientific Data Warehouses - Lazy ETL for Data-Intensive Research

In the dawn of the data intensive research era, scientific discovery deploys data analysis techniques similar to those that drive business intelligence. Similar to classical Extract, Transform and Load (ETL) processes, data is loaded entirely from external data sources (repositories) into a scientific data warehouse before it can be analyzed. This process is both, time and resource intensive an...

متن کامل

The Conceptual Modeling of Etl Processes

An ETL process includes various ETL activities, such as filtering, aggregating, checking for null values, etc., which can be represented by the constraint functions and transforming operations defined in previous section. However, the activities cannot exist in an ETL process independently; they must be organized in certain order that is specified in an ETL task of the ETL process. We think tha...

متن کامل

ETL Extract , Transform and Load ( ETL ) Performance Improved by Query Cache

Extraction, Transformation, and Loading (ETL) processes are responsible for the operations taking place in the back stage of a data warehouse architecture Extract, transform and load (ETL) is the core process of data integration and is typically associated with data warehousing. ETL tools extract data from a chosen source, transform it into new formats according to business rules, and then load...

متن کامل

Improve Performance of Extract, Transform and Load (ETL) in Data Warehouse

Extract, transform and load (ETL) is the core process of data integration and is typically associated with data warehousing. ETL tools extract data from a chosen source, transform it into new formats according to business rules, and then load it into target data structure. Managing rules and processes for the increasing diversity of data sources and high volumes of data processed that ETL must ...

متن کامل

Data Quality Problems in ETL: The State of the Practice in Large Organisations

This paper presents a review of the data quality problems that arise because of Extract, Transform and Load (ETL) technology in large organisations by observing the context in which the ETL is deployed. Using a case study methodology, information about the data quality problems and their context arising from deployments in six large organisations is reported. The findings indicate that ETL depl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2013